Assembly and comparative analysis of the initial complete mitochondrial genome of Primulina hunanensis (Gesneriaceae): a cave-dwelling endangered plant

Background Primulina hunanensis, a troglobitic plant within the Primulina genus of Gesneriaceae family, exhibits robust resilience to arid conditions and holds great horticultural potential as an ornamental plant. The work of chloroplast genome (cpDNA) has been recently accomplished, however, the mitochondrial genome (mtDNA) that is crucial for plant evolution has not been reported. Results In this study, we sequenced and assembled the P. hunanensis complete mtDNA, and elucidated its evolutionary and phylogenetic relationships. The assembled mtDNA spans 575,242 bp with 43.54% GC content, encompassing 60 genes, including 37 protein-coding genes (PCGs), 20 tRNA genes, and 3 rRNA genes. Notably, high number of repetitive sequences in the mtDNA and substantial sequence translocation from chloroplasts to mitochondria were observed. To determine the evolutionary and taxonomic positioning of P. hunanensis, a phylogenetic tree was constructed using mitochondrial PCGs from P. hunanensis and 32 other taxa. Furthermore, an exploration of PCGs relative synonymous codon usage, identification of RNA editing events, and an investigation of collinearity with closely related species were conducted. Conclusions This study reports the initial assembly and annotation of P. hunanensis mtDNA, contributing to the limited mtDNA repository for Gesneriaceae plants and advancing our understanding of their evolution for improved utilization and conservation. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10247-9.


Introduction
Primulina hunanensis, a perennial herbaceous member of the Gesneriaceae family, thrives in low-light, barren cave environments, and exhibits distinctive environmental adaptation characteristics [1].Renowned for its large green leaves, vibrant flowers, and elegant plant structure, this species has huge potential as an ornamental plant.Moreover, its capacity for asexual reproduction through leaf cuttings adds significant value to horticultural cultivation [2].Despite these attributes, the unique habitat of P. hunanensis, confined to a single county in Hunan Province, China, makes it a species of conservation concern due to its restricted distribution and small population size.Conducting genomic research on P. hunanensis holds profound practical significance, aiming to promote the effective utilization of its germplasm resources.
Mitochondria plays a crucial role in oxidative metabolism within eukaryotic cells, through energy synthesis and conversion across the organismal life processes.Besides energy provision, mitochondria are involved in diverse physiological activities, including cellular information transmission, regulation of cell division, differentiation, and apoptosis [3][4][5].Aligned with endosymbiosis theory, the evolution of chloroplasts and mitochondria began from their ancestral existence as free-living prokaryotes to their current status as specialized organelles with distinct functionalities [6][7][8].Unlike nuclear genome in the majority of seed plants that is derived from biparental contributions, chloroplasts and mitochondria genomes have an exclusive maternal inheritance pattern [9].This uniparental mode reduces the difficulty of genetic research, and therefore, organelle genomes are widely used in phylogenetics and evolutionary biology [10].
The first mtDNA within terrestrial plant Marchantia polymorpha was sequenced in 1992 [11], and since then, mtDNAs of various plants have been extensively studied.MtDNAs have been sequenced in dicotyledonous plants like Brassica napus [12], Arabidopsis thaliana [13], and Beta vulgaris [14], as well as monocotyledonous plants like Oryza sativa [15].In contrast to cpDNAs, plant mtDNAs exhibit complex structure, including branching linear, unilinear, cyclic, and a combination of cyclic and linear [16,17].Generally, the mtDNA size within angiosperms ranges from 200 Kb to 3 Mb, with considerable variations evident not only across distinct plant species but also within genus [18].These variations primarily arise from exogenous DNA elements integration, as well as prevalent repetitive sequences [19].
While plant mtDNA demonstrates considerable size and structural variation, genes exhibit relative conservation.Cytochrome c oxidase genes, NADH dehydrogenase genes, and ATP synthase genes, are few among many genes present in the vast majority of plants [20].Key characteristics of plant mtDNA include sparsely distributed genes, abundant noncoding sequences, a wide array of repetitive sequences, and extensive RNA editing [21].These characteristics contribute to the instability of mtDNA structure, making assembly more challenging.Consequently, the sequencing of mtDNA lags behind that of cpDNAs due to its complexity [22].Nevertheless, related reports on plant mtDNAs are expected to increase rapidly as sequencing and assembly technologies advance.
Within the family Gesneriaceae, mtDNA of only two species: Boea hygrometrica [23] and Haberlea rhodopensis [24], have been reported.In this study, we report the initial complete mtDNA of P. hunanensis, complementing the cpDNA sequence previously reported (https:// www.ncbi.nlm.nih.gov/, ON456288.1).Our results contribute to the growth of the mitochondrial DNA database specific to the Primulina genus, providing crucial genetic data essential for the conservation and cultivation of species within the Gesneriaceae family.
We used Bandage software to visualize the schematic of the mtDNA assembled based on long-reads data (Fig. 2A): the schematic contained six contigs, the naming, sequence length, and sequencing depth of contigs were shown in Table S1, and each contig represented a sequence obtained from the assembly, and if two contigs were connected by a black line, it means that there was an overlapping region between the two contigs.All these contigs constituted a complex multibranched closed structure that represented the complete mtDNA of the species.For several key nodes where branches exist, we used the Unicycler software to solve them on the basis of long-reads.The basic principle of Unicycler to solve duplicate regions is to map the relevant sequences at branching nodes onto long-reads, and when two sequences connected along the black line appear in the same long-read, it means that the long-read supports the connection of these two sequences.For branching nodes where there are multiple connections, those connections supported by more long-reads are preferred.The genome sequence of one line obtained after solving the branch nodes based on Unicycler was shown in Fig. 2B, with a specific solution path of 3-5_copy-2-5-4-6-1 (Table S2).Of course, we emphasize that the processing here is not the only form, because the structure of plant mtDNAs is not unique, but is in the midst of dynamic changes.This processing was chosen to facilitate the subsequent analysis.

PCGs codon usage analysis
PCGs cumulative length in P. hunanensis amounted to 53,589 bp.Predominantly, to most PCGs, ATG was the initial codon, with the exception of ACG for rps10 and GTG for rpl16.While the termination codons for most PCGs were TAA, TGA, or TAG, the termination codon  Ribosomal protein small subunit rps3 a , rps4, rps7, rps10 a , rps12, rps13, rps14

RNA editing sites prediction
Deepred-mt software predictions identified RNA editing events on 37 unique PCGs from P. hunanensis mtDNA.At a predictive performance of 0.9 (0 indicates no editing, while 1 represents editing.The closer the value is to 1, the higher the likelihood of editing occurrence), a total of 455 potential RNA editing sites were identified across the 37 mitochondrial PCGs, uniformly characterized by C to U conversions (Table S11).Notably, ccmB gene had the highest RNA editing sites frequency, totaling 37 edits.The nad4 gene displayed 36 RNA editing events, whereas both rpl2 and rps7 had only a single editing site (Fig. 7).A total of 43 codon variations were implicated in RNA editing sites, of which 45.94% of the amino acids had no alteration in hydrophobicity.On the other hand, 47.03% were predicted to undergo a transition from a hydrophilic to a hydrophobic state, while 6.59% were anticipated to shift from hydrophobic to hydrophilic (Table S12).Furthermore, RNA editing events were found to potentially introduce stop codons in the atp6 and rps10 genes.

Collinearity analysis
The collinearity analysis, excluding collinear blocks with lengths < 0.5 kb revealed numerous homologous between  S13).Additionally, there were unique blank regions in P. hunanensis sequences, exhibiting no homology to other species (Fig. 8).The results showed that the mtDNA alignments of the eight species of Labiatae were inconsistent, and that the mtDNA of P. hunanensis underwent extensive genomic rearrangements with its close relatives.

Characteristics of the mtDNA of P. hunanensis
Currently, complete mtDNA have only been reported for B. hygrometrica and H. rhodopensis within the Gesneriaceae.In this study, we have assembled and determined that P. hunanensis mtDNA size was 575,242 bp, slightly longer than that of B. hygrometrica (510,519 bp) and H. rhodopensis (484,138 bp), with 43.54% GC content, similar to B. hygrometrica (43.27%) and H. rhodopensis (44.1%) [23,24] Most land plants typically possess three rRNA genes [25], and we identified the same three rRNA genes (rrn18, rrn26, and rrn5) in the mtDNA of P. hunanensis.
The complexity and variations within plant mtDNA structure present persistent challenges for assembly and annotation efforts [21].Unlike the circular structure presented by the mtDNAs of B. hygrometrica and H. rhodopensis, in this study, we chose to describe the mtDNA of P. hunanensis in terms of a linear structure supported by more long-reads, which does not imply that its actual structure is linear, but is more likely to exist in different conformations.Although most of the previous studies described the mtDNA structure in terms of a circular shape, the plant mtDNA is supposed to be assembled as a complex dynamic linear DNA collection of smaller circular and branched DNA molecules [26][27][28].The results of this study further support the idea that multiple types of structures exist in the mtDNA of plants.

Codon usage analysis of PCGs and repeat sequences
The PCGs in the mtDNA of P. hunanensis accounted for only 9.32% (53,589/575,242) of the total sequence length, potentially reflecting an accumulation of repetitive sequences during the evolutionary process [29].In the mtDNA of organisms, ATG and GTG are common start codons, and TAA, TAG, and TGA are typical stop codons [30,31].Among the mitochondrial PCGs of P. hunanensis, the rpl16 gene has GTG as the start codon, and this phenomenon has also been widely reported in other plants such as maize [30].In addition, ACG as the start codon of the rps10 gene, and CGA and CAA as the stop codons of the rps10 and atp6 genes, respectively, are thought to be the possible result of RNA editing modifications [32].Analysis of codon usage through RSCU unveiled a notable adenine/thymine (A/T) richness at 3rd codon position within P. hunanensis mtDNA.The prevalence of NNA and NNT codons, with RSCU values equal to or surpassing 1.00, underscores a pronounced bias towards AT nucleotides at the third codon-a prevalent pattern observed in higher plants [33].
Repetitive elements, comprising simple repeats, tandem repeats, and dispersed repeats, are prevalent in plant mtDNAs and exert a significant influence on mitochondrial evolution [34,35].For SSRs, monomerics were dominated by A/T repeats, constituting 93.55% (29/31), while dimeric SSRs prominently featured AT/TA repeats, accounting for 56.52% (13/23).The high A/T composition in SSRs likely contributes to the overall AT richness observed in P. hunanensis mtDNAs.This aligns with broader trends seen in land plants, reflecting a consistent evolutionary trajectory within plant organelle genomes base composition [20,33,36].The total length of the dispersed repeats was 13,887 bp, accounting for 2.41% (13,887/575,242) of the total mtDNA length.Predominantly, the dispersed repeats spanned 30-50 bp (216, 82.44%), with the longest palindromic repeat (P) measuring 147 bp, as well as the longest forward repeat (F) extending to 2,919 bp.These repetitive sequences not only contribute to mtDNA size expansion but also signify the frequent occurrence of intermolecular recombination in mtDNA [19,25].

Homology analysis
Angiosperms display extensive genetic material exchange between organelle genomes, through the transfer of DNA from the chloroplast to the mtDNA, known as MTPTs [37].These fragments typically constitute 1-12% total mtDNA length.The mtDNA length of grape (Vitis vinifera) was 773,279 bp, with cpDNA fragments accounting for about 42% of the total, including 30 chloroplast PCGs and 17 tRNA genes [38].Horizontal gene transfer is also documented between organelles and the nuclear genome [39,40].We found 65,489 bp of homologous sequences in the mtDNA and cpDNA of P. hunanensis, representing approximately 11.38% of the total mtDNA length.Annotation of these homologous sequences unveiled incomplete fragments originating from the cpDNA.Previous research indicates that, such homologous fragments vary among species in length and sequence [41].Furthermore, genes migrating from chloroplasts to mitochondria often become pseudogenes over time, lacking functional significance, possibly due to sequence recombination [26,42].In this study, 11 intact tRNA genes, constituting 55% (11/20) of the total P. hunanensis mtRNA genes, were reported.The tRNA gene transference from chloroplasts to mitochondria is a recurring theme in plant evolution.Notably, as plants ascend from lower to higher taxa, the frequency of such tRNA transfers intensifies, aligning with the increasing demands for protein synthesis [43,44].The cpDNA of P. hunanensis contributes numerous sequences to the mitochondrial genome, enhancing the diversity of its mtDNA.Yet, the mechanisms driving sequence migration between the genomes and the expression of genes within migrated sequences remains unknown, warranting further investigation.

RNA editing sites prediction
RNA editing involves alterations in amino acid composition during protein translation.This results in differences between the amino acids constituting the translated protein and the encoded information in the gene sequence, arising from nucleotide deletions, insertions, or substitutions in the transcribed mRNA molecule [45] Plant RNA editing predominantly takes the form of C-U conversions [46,47].Notably, RNA editing can induce changes in start and stop PCGs codons.For instance, rps10 exhibits an ACG start codon and a CGA stop codon, while atp6 has a CAA stop codon, alterations potentially attributed to RNA editing.Typically, new start and stop codons derived from RNA editing leads to the production of more evolutionarily conserved proteins, displaying high homology with corresponding proteins in other species.This conservation enhances the expression of mitochondrial genes [48].Numerous RNA editing events culminate in amino acid changes, mostly converting hydrophilic amino acids to hydrophobic ones.In the mitochondria of P. hunanensis, 47.03% of amino acids in PCGs were predicted to undergo such changes following RNA editing.Hydrophobicity significantly influences Fig. 8 Collinearity analysis of P. hunanensis.The bar graph illustrates the mtDNA, with ribbons indicating homologous sequences between adjacent species.Red arcs signify inversion regions, gray areas denote regions of strong homology, and blank regions represent sequences unique to each species protein folding, ensuring optimal protein conformation and functionality [49].
Earlier studies have reported 419 RNA editing sites in 35 PCGs of H. rhodopensis, 389 in 32 B. hygrometrica PCGs [24].In this study, the 455 RNA editing sites identified were distributed across 37 PCGs within the mtDNA of P. hunanensis.RNA editing influences plant growth and development, constituting an essential feature of gene expression in plant mitochondrial genomes.Unedited RNA translation into protein typically results in severe or even lethal phenotypic outcomes [50,51].The exploration of RNA editing sites contributes to unraveling gene expression mechanisms within both chloroplasts and mitochondria, offering vital insights for predicting gene function [52,53].

Phylogenetic analysis and collinearity
The mtDNA has gained increased attention in phylogenetic analysis, especially with sequencing and assembly technologies rapid progression [33,54].In this study, P. hunanensis, B. hygrometrica, and H. rhodopensis, belonging to the Gesneriaceae family, formed a closely related cluster.Moreover, P. hunanensis displayed a closer phylogenetic relationship with B. hygrometrica compared to other relatives.It is worth noting that mitochondria, regarded as a relatively autonomous genetic system, might poses mitochondrial genes acquired from divergent plant species [55].Consequently, constructing a phylogenetic tree solely based on mtDNA may not precisely mirror the accurate phylogenetic relationships [56,57].To this regard, further studies are needed incorporating the emerging understanding of mtDNA nature.
Homology analysis began in the 80s and prove crucial in elucidating species evolution [58][59][60][61].Our current study unraveled numerous homologous collinear blocks shared between P. hunanensis and seven species within the order Lamiales.The observed disorder in the arrangement of these collinear blocks hints at extensive genomic rearrangements in the mtDNA of P. hunanensis, a phenomenon reported to be a potential driver of evolution and mtDNA diversity in P. hunanensis [62,63].

Conclusion
In this study, we successfully assembled and annotated the entire mtDNA of P. hunanensis, with a length of 575,242 bp and a GC content of 43.54%.A total of 60 genes, comprising 37 PCGs, 20 tRNA genes, and 3 rRNA genes were annotated.Our comprehensive analysis encompassed codon preference, repetitive sequences, genome homology, RNA editing events, phylogenetic relationships, and beyond.The data provided in this paper further support the idea that plant mtDNAs exist in multiple conformations, clarify the mitochondrial genomic features of P. hunanensis, and expand the mitochondrial genomic database of Primulina plant.
This research offers comprehensive insights into the organelle genetic characteristics and phylogenetic relationships of P. hunanensis, serving as an essential reference for subsequent investigations into the genomes of Gesneriaceae plants.

Plant materials, DNA extraction and sequencing
Plant specimens used in this study were sourced from our curated collection of P. hunanensis, harvested in June 2022 from Jianghua Yao Autonomous County, Hunan Province, China.A corresponding voucher specimen (K.M. Liu and X. Z. Cai 31,330 (HNNU)) has been deposited in the Herbarium of Hunan Normal University.
Leaves collected from the natural habitats of the species, and from plants cultivated in the botanical gardens were dried using silica gel.Total DNA was extracted using a modified CTAB method [64], and evaluated for quality, quantity and purity using NanoDrop spectroscopy (Thermo Fisher Scientific) and 1% agarose gel electrophoresis.Library construction was conducted using the Truseq Nano DNA HT Sample Preparation Kit (Illumina USA) according to manufactures instructions.In brief, the DNA underwent sonication and fragmentation to 350 bp, with subsequent PCR amplification.Purified PCR products were obtained by the use of AMPure XP purification kit, while library's size distribution was estimated using Agilent 2100 Bioanalyzer.real-time PCR was used for quantification.Sequencing was performed using paired-end PE-150 bp on the Illumina HiSeq 2500 platform.Concurrently, the same DNA sample underwent single-molecule real-time sequencing using Nanoporebased ONT (Oxford Nanopore Technologies).This process yielded a total of 12.13 Gb of sequence reads, with 9.472 kb mean read length for sequencing and an N50 of 22.276 kb.Both the library construction and sequencing procedures were performed at Wuhan Benagen Technology Company.

Assembly and annotation of mtDNA
Assembly of mtDNA: (1) The mtDNA of P. hunanensis was assembled using long-reads sequencing data.The assembly was performed using Flye software with default parameters [65], and graphical assembly results in GFA format were obtained.To identify contig fragments containing mtDNA, we built a database with the assembled contigs in FASTA format using makeblastdb [66].The BLASTn algorithm was used, utilizing conserved mitochondrial genes from Arabidopsis thaliana as query sequences.The specific parameters used were "-evalue 1e-5, -outfmt 6, -max_hsps 10, -word_size 7, -task blastn-short" [67].The resultant GFA files were visualized in Bandage software (v0.8.1) [68].Subsequently, long-read data were aligned to graphical mtDNA fragments via BWA software (v0.7.17) [69].The mitochondrial long-reads were selectively extracted to facilitate repetitive sequence regions within the plant's mtDNA indepth analysis.
(2) Simultaneously, the short-reads obtained from illumina were aligned to the mtDNA contigs acquired in step (1) through BWA software (v0.7.17) [69], resulting in the extraction of short-reads specific to mtDNA.The final assembly of mtDNA was performed using both the short-reads and long-reads in Unicycler [70].

RNA editing sites prediction
We predicted the RNA editing sites from all PCGs encoded within the mtDNA of P. hunanensis, specifically focusing on the conversion from cytosine (C) to uracil (U) using Deepred-mt tool [47].All sites above a probability threshold of 0.9 were retained.
project management and financial support.All authors read and agreed to the final manuscript.

Fig. 3
Fig. 3 MtDNA relative synonymous codon usage of P. hunanensis.The x-axis represents codon families, with RSCU values representing specific codon frequency in comparison to uniform synonymous codon usage expected frequency

Fig. 4
Fig. 4 Histogram of repetitive sequence analysis.A The horizontal axis represents SSR types, while the vertical axis shows the number of repetitive fragments.The legend utilizes colors for clarity.B On the horizontal axis shows the types repeats, while the vertical axis illustrates the number of repeats.The colors distinguish the categories

Fig. 5 Fig. 6 Fig. 7
Fig. 5 Homology analysis of the mitochondrial and chloroplast genomes of P. hunanensis.The grey arcs symbolize the mtDNA, the light green arcs denote the cpDNA, and blue lines connecting arcs signify homologous sequences

Table 1
The mtDNA encoding genes of P.